Goto

Collaborating Authors

 budgeted reinforcement learning


Budgeted Reinforcement Learning in Continuous State Space

Neural Information Processing Systems

A Budgeted Markov Decision Process (BMDP) is an extension of a Markov Decision Process to critical applications requiring safety constraints. It relies on a notion of risk implemented in the shape of an upper bound on a constrains violation signal that -- importantly -- can be modified in real-time. So far, BMDPs could only be solved in the case of finite state spaces with known dynamics. This work extends the state-of-the-art to continuous spaces environments and unknown dynamics. We show that the solution to a BMDP is the fixed point of a novel Budgeted Bellman Optimality operator. This observation allows us to introduce natural extensions of Deep Reinforcement Learning algorithms to address large-scale BMDPs. We validate our approach on two simulated applications: spoken dialogue and autonomous driving.


Budgeted Reinforcement Learning in Continuous State Space

Neural Information Processing Systems

So far, BMDPs could only be solved in the case of finite state spaces with known dynamics. This work extends the state-of-the-art to continuous spaces environments and unknown dynamics. We show that the solution to a BMDP is a fixed point of a novel Budgeted Bellman Optimality operator. This observation allows us to introduce natural extensions of Deep Reinforcement Learning algorithms to address large-scale BMDPs.


Reviews: Budgeted Reinforcement Learning in Continuous State Space

Neural Information Processing Systems

The introduction needs to mention that approaches like the latter *are* available solutions and frame the contribution of the paper rather as one of providing a "better" solution in whichever way the authors feel this is best described (more-efficient, etc.). MINOR COMMENTS: * It seems that else at the beginning of Algorithm 3, line 9 doesn't belong there.


Reviews: Budgeted Reinforcement Learning in Continuous State Space

Neural Information Processing Systems

The paper formulates a budgeted Markov decision process (BMDP) able to deal with large search spaces. All reviewers feel the proposed method is novel, interesting and could be an important step in trying to address some existing problems with "modern" RL approaches.


Budgeted Reinforcement Learning in Continuous State Space

Neural Information Processing Systems

A Budgeted Markov Decision Process (BMDP) is an extension of a Markov Decision Process to critical applications requiring safety constraints. It relies on a notion of risk implemented in the shape of an upper bound on a constrains violation signal that -- importantly -- can be modified in real-time. So far, BMDPs could only be solved in the case of finite state spaces with known dynamics. This work extends the state-of-the-art to continuous spaces environments and unknown dynamics. We show that the solution to a BMDP is the fixed point of a novel Budgeted Bellman Optimality operator.


Budgeted Reinforcement Learning in Continuous State Space

Carrara, Nicolas, Leurent, Edouard, Laroche, Romain, Urvoy, Tanguy, Maillard, Odalric-Ambrym, Pietquin, Olivier

Neural Information Processing Systems

A Budgeted Markov Decision Process (BMDP) is an extension of a Markov Decision Process to critical applications requiring safety constraints. It relies on a notion of risk implemented in the shape of an upper bound on a constrains violation signal that -- importantly -- can be modified in real-time. So far, BMDPs could only be solved in the case of finite state spaces with known dynamics. This work extends the state-of-the-art to continuous spaces environments and unknown dynamics. We show that the solution to a BMDP is the fixed point of a novel Budgeted Bellman Optimality operator.